33 research outputs found

    Backup without redundancy: genetic interactions reveal the cost of duplicate gene loss.

    Get PDF
    Many genes can be deleted with little phenotypic consequences. By what mechanism and to what extent the presence of duplicate genes in the genome contributes to this robustness against deletions has been the subject of considerable interest. Here, we exploit the availability of high-density genetic interaction maps to provide direct support for the role of backup compensation, where functionally overlapping duplicates cover for the loss of their paralog. However, we find that the overall contribution of duplicates to robustness against null mutations is low ( approximately 25%). The ability to directly identify buffering paralogs allowed us to further study their properties, and how they differ from non-buffering duplicates. Using environmental sensitivity profiles as well as quantitative genetic interaction spectra as high-resolution phenotypes, we establish that even duplicate pairs with compensation capacity exhibit rich and typically non-overlapping deletion phenotypes, and are thus unable to comprehensively cover against loss of their paralog. Our findings reconcile the fact that duplicates can compensate for each other's loss under a limited number of conditions with the evolutionary instability of genes whose loss is not associated with a phenotypic penalty

    Similarities and Differences in Genome-Wide Expression Data of Six Organisms

    Get PDF
    Comparing genomic properties of different organisms is of fundamental importance in the study of biological and evolutionary principles. Although differences among organisms are often attributed to differential gene expression, genome-wide comparative analysis thus far has been based primarily on genomic sequence information. We present a comparative study of large datasets of expression profiles from six evolutionarily distant organisms: S. cerevisiae, C. elegans, E. coli, A. thaliana, D. melanogaster, and H. sapiens. We use genomic sequence information to connect these data and compare global and modular properties of the transcription programs. Linking genes whose expression profiles are similar, we find that for all organisms the connectivity distribution follows a power-law, highly connected genes tend to be essential and conserved, and the expression program is highly modular. We reveal the modular structure by decomposing each set of expression data into coexpressed modules. Functionally related sets of genes are frequently coexpressed in multiple organisms. Yet their relative importance to the transcription program and their regulatory relationships vary among organisms. Our results demonstrate the potential of combining sequence and expression data for improving functional gene annotation and expanding our understanding of how gene expression and diversity evolved

    The Iterative Signature Algorithm for the analysis of large scale gene expression data

    Full text link
    We present a new approach for the analysis of genome-wide expression data. Our method is designed to overcome the limitations of traditional techniques, when applied to large-scale data. Rather than alloting each gene to a single cluster, we assign both genes and conditions to context-dependent and potentially overlapping transcription modules. We provide a rigorous definition of a transcription module as the object to be retrieved from the expression data. An efficient algorithm, that searches for the modules encoded in the data by iteratively refining sets of genes and conditions until they match this definition, is established. Each iteration involves a linear map, induced by the normalized expression matrix, followed by the application of a threshold function. We argue that our method is in fact a generalization of Singular Value Decomposition, which corresponds to the special case where no threshold is applied. We show analytically that for noisy expression data our approach leads to better classification due to the implementation of the threshold. This result is confirmed by numerical analyses based on in-silico expression data. We discuss briefly results obtained by applying our algorithm to expression data from the yeast S. cerevisiae.Comment: Latex, 36 pages, 8 figure

    A Human-Curated Annotation of the Candida albicans Genome

    Get PDF
    Recent sequencing and assembly of the genome for the fungal pathogen Candida albicans used simple automated procedures for the identification of putative genes. We have reviewed the entire assembly, both by hand and with additional bioinformatic resources, to accurately map and describe 6,354 genes and to identify 246 genes whose original database entries contained sequencing errors (or possibly mutations) that affect their reading frame. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that might be targeted for antifungal therapy. We also observed that, compared to other fungi, the protein-coding sequences in the C. albicans genome are especially rich in short sequence repeats. Finally, our improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl-CoA and enoyl-CoA carriers, several novel amino acid degrading enzymes, a variety of secreted catabolic lipases and proteases, and numerous transporters to assimilate the resulting nutrients. The results of these efforts will ensure that the Candida research community has uniform and comprehensive genomic information for medical research as well as for future diagnostic and therapeutic applications

    Defining transcription modules using large-scale gene expression data

    No full text
    Running title: Defining modules using large-scale expression data Motivation: Large-scale gene expression data comprising a variety of cellular conditions holds the promise of a global view on the transcription program. While conventional clustering algorithms have been successfully applied to smaller datasets, the utility of many algorithms for the analysis of large-scale data is limited by their inability to capture combinatorial and conditionspecific co-regulation. In addition, there is an increasing need to integrate the rapidly accumulating body of other high-throughput biological data with the expression analysis. In a previous work, we introduced the Signature Algorithm, which overcomes the problems of conventional clustering and allows for intuitive integration of additional biological data. However, the applicability of this approach to global analyses is constrained by the comprehensiveness of relevant external data and by its lacking capability of capturing hierarchical organization of the transcription network. Methods: We present a novel method for the analysis of large-scale expression data, which assigns genes into context-dependent and potentially overlapping regulatory units. We introduc

    Regulatory Relations between Modules

    No full text
    <div><p>A selection of eight transcription modules whose function is known in yeast was used to generate the corresponding (refined) homologue modules in the other five organisms. Each module is associated with a “condition profile” generated by the signature algorithm based on the expression data.</p> <p>(A) Correlations between these profiles were calculated for all pairs of modules in each organism. Note that for E. coli there is no proteasome and that the mitochondrial ribosomal proteins (MRPs) correspond to ribosomal genes. Modules are represented by circles (legend). Significantly correlated or significantly anticorrelated modules are connected by colored lines indicating their correlation (color bar). Positively correlated modules are placed close to each other, while a large distance reflects anticorrelation. See Figure S11 for a numerical tabulation of all pairwise correlations.</p> <p>(B and C) Correlations between pairs of modules according to the cell-cycle data as a function their correlation in the full data. Each circle corresponds to a pair of S. cerevisiae modules (B) or human modules (C).</p> <p>(D) To check the sensitivity of our results with respect to the size of the dataset, we reevaluated the correlations between the sets of conditions for randomly selected subsets of the data. Shown are the mean and standard deviation of the correlation coefficient between the heat-shock and protein-synthesis modules as a function of the fraction of removed conditions (see Figures S4 and S5 for correlations between other module pairs).</p></div

    Using Expression Data to Identify and Refine Sequence-Based Functional Assignments

    No full text
    <div><p>(A) Starting from a set of coexpressed genes (yellow dots in left box) associated with a particular function in organism A, we first identify the homologues in organism B using BLAST (middle box). Only some of these homologues are coexpressed while others are not (blue dots). The signature algorithm selects this coexpressed subset and adds further genes (light yellow) that were not identified based on sequence, but share similar expression profiles (right box).</p> <p>(B) The 15 coexpressed genes associated with heat shock in yeast (center) have eight homologues in E. coli (left) and 14 in C. elegans (right). Among the ten genes whose expression profiles are the most similar to these homologues (bottom), many are known to be associated with heat-shock response (boldface).</p> <p>(C) For each of the six organisms, the distribution of the <i>Z</i>-scores for the average gene–gene correlation of all the “homologue modules” (<a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0020009#s4" target="_blank">see Materials and Methods</a>) obtained from the yeast modules is shown (top). Rejecting the homologues that are not coexpressed gives rise to the “purified modules,” whose <i>Z</i>-scores generally are larger (except for the yeast modules, which contain only coexpressed genes from the beginning). Adding further coexpressed genes yields the “refined modules,” which have significantly larger <i>Z</i>-scores (bottom).</p></div
    corecore